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Abstract — Digital architectures for Chebyshev interpolation 
are explored and a variation which is word-serial in nature is 
proposed. These architectures are contrasted with equispaced 
system structures. Further, Chebyshev interpolation scheme is 
compared to the conventional equispaced interpolation vis-a-vis 
reconstruction error and relative number of samples. It is also 
shown that the use of a hybrid (or dual) Analog to Digital 
converter unit can reduce system power consumption by as much 
as l/3 rd of the original. 

I. Introduction 

Applications like synchronization in software defined radio 
(SDR) and power constrained sampling in sensor networks can 
have solutions garnered from non-uniform sampling research. 
Often in such pursuits, the hardware requirements and efficient 
architecture design are ignored JTI, pj. Signal interpolation is 
one of the underlying questions which one tries to solve in 
such applications. Chebyshev interpolation technique (3) in 
particular has been a promising non-uniform sampling and 
interpolation scheme. In general, sampling on non-uniform 
grid has many advantages (see p), HI). For example, Runge 
(see |4|j pp 155- 156) demonstrated that interpolation of eq- 
uispaced signal values is non-optimal for a certain class of 
functions. Sampling on the uniform grid, on the other hand 
though sometimes suboptimal, has been widely used in clock 
synchronization, timing correction, sample rate conversion 
among other applications. 

Fox and Parker (3) suggested two similar schemes for 
Chebyshev interpolation. Neagoe et al. (TJ showed that the 
coefficient set of one of these interpolation schemes is the 
output of a DCT (Discrete Cosine Transform) of the input sam- 
ples. After these mathematical results Zhu |2], Wang |5| and 
Cuypers et al. [6| have tried presenting digital implementations 
of the interpolation scheme. These architectures sometimes 
don't utilize hardware efficiently or are specific to an output 
node set. With an objective of designing a more flexible struc- 
ture, this paper explores the merits and demerits of Chebyshev 
interpolation from an implementation perspective. A systolic 
array based Chebyshev interpolation architecture for a window 
of 8 samples is designed which is word-serial in nature (unlike 
the previous suggested structures). A sampling scheme is 
also proposed involving a SAR (Successive Approximation) 
ADC (Analog to Digital Converter) and a flash ADC to 
make a Flash-SAR hybrid converter block. By suitably sharing 
the samples between these ADCs, Chebyshev sampling is 
performed at ~ 30 — 40% lesser power consumption levels. 

Section |Il] revisits the mathematical basis of Chebyshev 
interpolation. Digital structures are explored and a new one 



is proposed in Section [TTT] Details which make Chebyshev 
interpolation a viable alternative to equispaced interpolation 
are presented in Section IV and finally Section fV| summarises 
the theme and contribution of this paper. 

II. Theory of Chebyshev Interpolation 

Chebyshev polynomials of the first kind {T n (x)},x € 
[—1,1] can be defined recursively as 



T n +i(x) = 2xT n (x) - T„_ 

where the first three polynomials are 

To Or) = 1 
T x {x) = x 
T 2 {x) = 2x 2 - 1 
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The k zero of an n order polynomial (T n (x)) is given 



Xk = cos( 



2k- 1 
2n 
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A polynomial (say Pjv(x)) can be constructed from 
{T n (x)}, which minimizes the maximum deviation from the 
exact underlying signal ( |4|,pp.l56). To perform a N th degree 
polynomial approximation in [—1, 1] (this interval can be 
changed easily), the sample points (xk) should be chosen at 
the roots of T n +\{x). This leads to a nonuniform grid which 
is denser at the edges and sparse towards the center. It can be 
shown that the polynomial Pn(x) is a linear combination of 
To, ...jTjv-i for which the coefficients are the DCT (Discrete 
Cosine Transform) of the sample values (sampled at Xf.) (TJ. 
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Pn{x) = ^^CiT. t {x) where 
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T (x) = -^T (x),T N>0 (x) = T N>0 (x) (7) 
{cj} oc DCT([/(xi) . . . f(xN+i)] T ■ We can rewrite this as 
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From Equations [6] and [8] 



P N (x) = [f(x )...f(x N )]C T T 



(ID 



Here T is the matrix of coefficients of powers of x of the 
Chebyshev polynomials in decreasing order. In the general 
Lagrange interpolation case, C T T can be replaced by a matrix 
L representing coefficients of Lagrange polynomials. 

III. Architectures for Chebyshev Interpolation 
A. Prior Art 

Two systolic arrays for Chebyshev interpolation by Zhu et 
al. |2j are based on transform and time domain descriptions 
of the interpolation operation respectively. The distinction 
between time and transform domain structures is based on 
which summations in the interpolation formula (Equation 1 1 1 
are done first. In the first array, the set of coefficients Cj are 
computed first and then their product with Ti(x) is carried out 
. In the second, DCT of the Chebyshev polynomials generates 
the set of Chebyshev Type Interpolation Functions (CTIF) 
{(f>i(x)} which are then used for multiplication with {f(xi)} 



i.e., Equation 11 is rewritten as 



/v 



p n(x) = ^2 f(xi)(j>i(x) 



where 4>i(x) is calculated using Equations [9] and 10 

'kn(2i + 1 
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<t>i{ x ) = J~] (J-kTk(x)cos 
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2(N+1) 



(12) 
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A disadvantage with this structure is that the input is 
assumed to come in parallel. Thus the multiplications which 
could have otherwise been scheduled vis-a-vis time are now 
being done simultaneously, reducing hardware utilization effi- 
ciency. 

A structure similar to the previous ones is proposed by Wang 
et al [5|. It assumes that we output another set of Chebyshev 
sampled values (with order M ^ N) from the existing ones 
and the hardware has been optimized keeping this in view 
making it unusable for an arbitrary output node set. 

Cuypers et al. [6] also propose two architectures. The first 
uses a fast DCT block and employs a fast adder for Chebyshev 
recursive relations of Equation [T] No insight into the com- 
putational load per clock cycle, simultaneous use of adders 
and other implementation details is given. The second scheme 
suggests the use of a Farrow structure and a CORDIC unit 
to perform interpolation assuming that the input signal is in 9 
domain rather than the x domain (x = cos(6)). Computation 
of the Farrow structure coefficients is not explained. 

B. Proposed Scheme 

The assumption that all the samples are available at the 
same time is not practical. An implementation which is word- 
serial rather than word-parallel in nature would better utilize 



hardware and not require more buffering of samples than 
necessary. Keeping this in view, a design which makes use of 
the word-serial property of the input and reduces the overall 
count of multiply and add units is proposed. Portions of the 
computation (Equation [TT) are performed as samples arrive 
one by one. For instance, {c^}, the set of coefficients in 
the interpolation formula of Equation [8] is computed using a 
word serial systolic array shown in Figure [T] The Chebyshev 
polynomials are also computed according to Equation [T] at an 
arbitrary node set by rescheduling a pair of multiply and add 
units (IIR filtering). Maximum resource usage is guaranteed 
(i.e., Hardware Usage Efficiency = 100 %) for both the 
computation of the coefficients as well as the subsequent FIR 
filtering (multiplication and summation). Though the first part 
of the total system could be optimized by using any of the fast 
DCT architectures available ( |7), |8)), a generalized systolic 
array performing matrix vector multiplication has been used 
to keep the structure independent of the output node set. The 
transformation matrix T of the 2D dependence graph (DG) 
which was obtained by choosing the desired sequence of inputs 
and outputs is T = [1 0] and the schedule vector (s) which 
allows the reuse of multiply and add units is [1 1] T . 

For both, coefficient generation and Chebyshev polyno- 
mial evaluation, multiplexers with timing control are used 
to achieve correct flow of data. For example, the output 
of the Chebyshev polynomial evaluation unit has to switch 
between IIR filter mode and connect '0' and input 'x' to the 
output when Tq(x) and Ti(x) are evaluated in each N cycle 
period. Note that since the normalized Chebyshev function 
values are needed, the recursions are slightly different from 
Equation jlj For example, even though To(x) = and 
T n>0 (x) = T n>0 (x), the recursion formulae for {Ti(x)} 
will not work for {Ti(x)}. Specifically it will fail at the 
step where T^x) is substituted as 2xT\(x) — Tb(x) since, 
f 2 \x)_= T 2 (x) = 2xT x {x) - T (x) and T (x) is different 
from To(x). 

A tabulation of multiplications, additions and computations 
per cycle required by the word-serial architecture proposed 
compared to Zhu's systolic arrays is provided in Table [I] 
Solutions by Wang et al. [9] and Cuypers et al. have not been 
compared because the former optimizes for a specific output 
node set and the latter does not discuss the structures at an 
implementation level. 

IV. Advantages of Chebyshev sampling 

A. Reduction of interpolation error 

Two characteristic signals are taken to investigate the ef- 
fectiveness of the interpolation scheme. In addition to a 
bandlimited signal, a non bandlimited signal (eT x sin(8x)) 
is also chosen. When the number of samples is 8 and the 
bandlimited function is a sinusoid plus its third harmonic 
(sm(4x) + 0.5sin(8x), x € [—1,1]), the interpolation error 
for the equispaced case is ^4 times the Chebyshev case as 
shown in Figure [2] A similar result is obtained for the non 
bandlimited case as shown in Figure [3] 
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| cn-1 ...c2,c1,c0 (to be multiplied with Ti(x)) 
Fig. 1. Part of the proposed structure, working on word serial data f(xi) from a data converter to get the coefficients. 
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TABLE I 

Computations needed in different structures (assuming a set 
of 8 samples). 
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Fig. 3. A non bandlimited function e x sin(8x) over the interval [—1, 1] 
is interpolated using both the schemes. 





Chebyshev Interpolation using 8 samples (MSE = 1.1%) 
Uniform interpolation using 8 samples (MSE = 4.7%) 
The exact function 



Fig. 2. In general, more number of samples for the equispaced case are 
required in the same interval to reduce the interpolation error. 



B. Use of Hybrid ADC for power savings 

Power savings can be achieved during sampling in a Cheby- 
shev based interpolation system through the use of two (dual) 
data converters (ADCs) as a hybrid. When the interpolation 
error limit is fixed for the equispaced and Chebyshev cases, 
the number of samples required to do so also becomes fixed. 



In some cases as seen in Section IV-A equispaced system 
requires more samples than the Chebyshev system. For the 
Chebyshev system, a scheme is proposed where the samplings 



are split between two ADCs, one of which is faster but 
power consuming (flash) and the other is slower but power 
saving (SAR). To do so, a flash and SAR ADC are bundled 
together with a timing control unit to make a hybrid unit. 
A simple strategy to split the samples between the flash and 
SAR is based on whether the ratio of the intersample interval 
is greater than the SAR sampling and conversion time (i.e., 
Tsar < floor (sinkc / sine) where c = (derivable from 
Equation [5]). Table [H] shows the power savings as a function of 
sharing of samples between the two ADCs for the two example 



signals of Section IV-A Note that, Flash ADC topology is 
assumed to be thermometric (not necessarily the case) and 
power consumption per comparison in either case is taken to 
be 1 arbitrary unit (au.) 

C. Chebyshev Interpolation and Farrow Structures 

Even though Chebyshev Interpolation using a Farrow Struc- 
ture is described in [6|, it assumes that the input signal is in 
the 6 domain. It is worthwhile to compare the structures for 
chebyshev interpolation in comparison to the Farrow structure 
which is widely used for equispaced interpolation even though 
this kind of polynomial interpolation is ill-conditioned. Farrow 
structure based interpolation units recently have been ported to 
perform some special nonuniform interpolations (T0j but don't 
yield to Chebyshev interpolation because the inter-sample 
intervals are too diverse in range. The DCT shortcut for the 
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TABLE II 

Sampling power savings per window is ~ 39% and ~ 44% resp. 
for each of the signals. 



Chebyshev case has made this scheme comparable to the 
Farrow shortcut [ 1 1 1 based equispaced case. 

D. Design summary and applications 

The non-uniformly spaced nodes in the Chebyshev interpo- 
lation require block processing which can be a disadvantage 
for applications where latency is critical. Further, the sampling 
times are not only irregular but they also cause sampling 
intervals to be non rational ratios of each other. This implies 
that there will always be an error in the sampling time even if 
a very high frequency timing clock is used. An analysis of the 
optimal number of sample points to be taken in a Chebyshev 
window hasn't been done and was fixed to 8. Nevertheless, 
this parameter has an effect on the flatness of the system 
frequency response and on the interpolation error. From an 
implementation perspective, latency would increase with an 
increase in this parameter. In a broader context like Chebyshev 
sampling, seeking out optimal node sets contingent to the 
class of signals at system input could lead to minimal power 
consumption and reconstruction errors. But such systems, like 
Chebyshev hardware will need extra logic for compatibility 
with the existing equispaced systems. For Chebyshev sampling 
to work well, the average sampling rate should be two times 
or higher than the f max of the input signal (Tl, but this is 
indeed the case in most DSP systems where the equispaced 
sampling rate is chosen to be 10-15 times f max as a rule of 
thumb. 

Sampling clock synchronization in DSL modems, timing 
correction, power efficient sensor networking, sample rate 
conversion in Software Defined Radio are some of the topics 
where Chebyshev interpolation can be used (fractional delay 
filters are already being used). Chebyshev interpolation is 
superior when accuracy is important. It also fits nicely with 
signal compression like the DCT compression scheme (where 
only subset of coefficients containing most of the energy are 
retained) (2). 

V. Conclusions 

A digital architecture performing Chebyshev interpolation 
based on systolic arrays assuming word-serial data input is 
implemented in detail and contrasted with other architectures 
proposed in literature. This structure has a latency of 2N 



cycles between the input and the interpolated output. Merits 
of Chebyshev sampling compared to equispaced sampling are 
then explored. A scheme for Chebyshev sampling using a 
Flash-SAR hybrid ADC unit is also discussed which results 
in power savings. By optimally distributing the share of 
samples which will be sampled by either type of ADC, power 
consumption of the sampling system is optimized. Once the 
samples have been obtained (sequentially), these are fed to 
the word serial systolic array for interpolation in the digital 
domain. Finally, the paper echoes the point that inexpensive 
digital computation can allow for system specific optimal node 
sets (not just Chebyshev and equispaced) leading to arbitrary 
precision in signal interpolation. 
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